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Solutions of Ax ^Bx' 



M. R. Hestenes"^ and W. Karush ^ 

The problem is to determine characteristic numbers and vectors for the problem 
Ax = \Bx, where A, H are nXn Hermitian matrices. A generalized gradient 77 is defined. 
From a first approximation Xo, a second approximation Xi = Xo-^ar) is determined. Successive 
approximations, with appropriate alphas, converge to a solution. 



1. Introduction 

Ijot A J B, be Hermitian matrices of order n with B 
positive definite. Then the characteristic vectors of 
the equation 

Ax=r.\Bx (1) 

are the critical points of the ''liayhMgh ([uotient'^ 



, . (x, Ax ) 



X9^0, 



(2) 



and the corresponding vahies of the quotient are tlie 
characteristic vahics X. In particular the minimum 
(maximum) of ^ is the least (greatest) characteristic 
value of (1). Our purpose is to discuss a method of 
finding the solutions of (1) that is })ase(l u])on this 
observation and tliat avoids a transformation of the 
problem. 

The method is an iterative one that may be de- 
scribed briefly as follows.^ With each non-null vector 
X we associate a vector r]{x) that is in a certain sense 
the gradient of m at x. We then pass from one approx- 
imation X to the next x' by means of the formula 



= X — arj, 



a>0, 



where the scalar a may depend upon x. The gradient 
used here is determined by the equation 



Grj =-- Ax — fjL (x) Bx, 



(3) 



where G is an arbitrary positive definite Hermitian 
matrix. In computational practice G would be se- 
lected so that its inverse 6" Us known (e. g., G=I, the 
identity matrix). In section 4 it will be shown that this 
method is convergent if the scalars a{x) are appro- 
priately chosen^ and in section 6 two feasible schemes 
for this choice will be described. In general, con- 
vergence is established only to some, possibly inter- 
mediate, characteristic value (and vector). Under 
special hypotheses this will be the least characteristic 
value (see section 5). 

The method has several computational advantages. 

1 The preparation of this paper was sponsored (in part) by the Office of Naval 
Research. 

2 Univ. of California at Los Angeles and NBS at Los Angeles. 

3 Univ. of Chicago and NBS at Los Angeles. 

* It is an extension of one used by the authors in the case that A is real sym- 
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It avoids a transformation of pro})lem (1).^ It 
minimizes round-off errors by beginning each step 
with a new initial vector. The calculations at each 
stage of the iteration arc simple and idcvntical in 
form with thos(^ of the preceding one. The method 
is thereby particularly suited to high-speed auto- 
matic computing machines. However, it appeai-s to 
conve^'ge too slowly to be of use for hand calculation. 

When one or more characteristic vectors ai'e known 
the method may be modified so as to yield a new 
characteristic vector (see section 7) . This is achieved 
by api)ropriat(dy altering eq. 3 for the gradient. 

For arbitrary complex matrices A, B it is of in- 
terest to know when the problem 



Cx=\T)x 



(4) 



may be transformed to one of type (1). Several 
characterizations ^ are given in section 8, some of 
which are of computational value. 

2. Preliminary Results 

In this section we shall state some definitions and 
assemble some well-known facts on niati-ic(*s. No 
proofs will be given. 

By a vector we understand an ri-tuple x={ai, 
a2, . . . , an) of complex numbers. We deal with 
the space of such vectors over the scalar field of 
complex numbers. We let 

(x,y)=aj)i+a2b2+ . . . +aj)rt, y={bi, 62, • . • , bn)j 

where c denotes the complex conjugate of the scalar 
c. Thus (x,y) = {y,x) . The length of x is [x] = {x,x)K 
If C is an arbitrary matrix then 

{x,Cy) = (C^x,y) 

where Cy has the usual meaning, and C* is the 
conjugate transpose of C. A matrix H is Hermitian 
if and only if 

In this case (Xyhx) is a real number. We shall say 

5 Such a transformation may, for example, involve finding the inverse of A or 
B. A more feasible scheme computationally is to write B= LL* with L triangular 
(X* = conjugate transpose of L). The kilter met liod is discussed on p. 159 to 160 
of Fox, Huskey, and Wilkinson, Notes on the solution of algebraic linear simul- 
taneous equations. Quart. J. Mech. and Apjjlied Math. p. 147 to 173 (1948). 

These results are closely related to some of H. Wielandt, Zur Abgrenzung der 
selbstadjungierten Eigenwertaufgaben. I. Raume endlicher Dimension, Math. 
Nachrichten 2, No. 6, 328 (1949). 
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that two vectors x and y are i?-orthogonal in case 

{x,Hy) = {y,Hx)=Q. 

Two sets of vectors are H-orthogonal in case each 
vector of one set is H-orthogonal to each vector of the 
other. By orthogonaUty is meant /-orthoganaHty , 
with / the identity matrix. 

A matrix G is positive definite in case it is Her- 
mitian and 

{x,Gi)y>^ whenever XF^O. 

Let G be positive definite. There exist positive 
numbers m{G) and M{G) such that 



m{G)\x\'<{x,Gx)<M{G)\x\- 



Also, we have the inequahty 



\{x,Gy)Y<{:x,Gx){y,Qy). 



(5) 



(6) 



the equahty holding if and only if x and y are linearly 
dependent. Fm'ther, the matrix G~^ is positive 
definite, and there exists a positive definite matrix 
Gi such that G=Gl. 

We turn now to problem (4), where C and D are 
arbitrary matrices. The number X' is a character- 
istic number (root, value) of (5) in case there is a 
non-null vector y' such that 

Cy' = WI)y\ 

We allow the characteristic value X'=oo; iu this 
case Dy^ = 0. We say that y^ is a characteristic vector 
belonging to X^ For a problem of type (1), where A 
is Hermitian and B is positive definite, every char- 
acteristic value is finite and real. Let 

Xi<X2< . . . <X, 

be the k distinct real characteristic roots of (1), and 
let Lj=L{\j) be the characteristic manifold belonging 
to X;, that is, linear subspace spanned by the char- 
acteristic vectors belonging to X.. Then au}^ two 
subspaces belonging to distinct X's are B- and 
^-orthogonal, and have only the null vectors in 
common. Further, every vector z has a unique 
decomposition of the form 

^=^1 + 2*2+ . . . Zk, ^j^I^j- 

For problem (1) the important extremum principle is 
X 5-orthogonal to Li, . . . , L/_i, 



Xj = min n{x). 



Xj=max ^{x), 



In particular, 



X 5-orthogonal to L^, . . . , Ly+i. 



Xi<m(^)<X/c 



Xr^O. 



3. The Gradient 

The direction for which the directional derivative 
of the function, ju given by (2), is a maximum will 
now be calculated. This optimal direction will be 
determined relative to the inner product (;x,Gy) 
corresponding to an arbitrary, fixed positive definite 
matrix G. The generality of an arbitrary inner prod- 
uct has computational significance as well as 
theoretical interest; in practice it is limited to 
matrices G whose inverses are known. The iteration 
method and convergence theorems that are to follow 
later depend only upon the final formula that will be 
obtained for the maximizing direction, not upon the 
derivation of the formula; the derivation is intended 
to suggest the motivation for the method. 

For fixed vectors Xt^O and 52'f^O, consider the 
function fjL{x-\-e8x) for real e. By a simple calculation 
we find that at e = 0, 



where 



dfi^ 2R{dx,^)} ^ 
de {x,Bx) 

^=^{x)=Ax—^{x) Bx, 



X9^0, 



(7) 



and R{c} denotes the real part of c. We therefore 
seek that vector dx for which 

/J{(6x,$)}=max, (dx,G8x)^l. (8) 

7] is defined by the equation 

G7j=i=Ax-fiBx. (9) 

Then, using (6), 

R{{8x,^)}=R{{8x,Gv)}<\{dx,Grjy\ 

<{bx,Gbx)i {ri,Grj)i = (rj,Grj)i. 

It is an easy matter to verify that dx=Tjl(r],G7])i is 
the unique normalized vector for which equality 
holds between the first and last terms above. Hence 
this vector is the desired solution of (8) . Introducing 
a change in normalization for convenience, rj is 
termed the gradient of ^u (with respect to G). 

We shall have occasion to use the gradient relative 
to a side condition 



(dx,z)=0, 



z fixed. 



(10) 



Here we wish to solve (8) relative to (10). f is 
defined by the equation 

G^=^+kz=G7)+hz 

where h is determined so that (ljz)=0. Thus 



h-- 



{G-'^,^) 



(12) 



Then, in light of (10), {dx,0 = (8x,G^). As before, it 

follows that the maximizing vector is proportional 

to f, and this vector is chosen to be the gradient. 

More generally, with several independent side 
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conditions 

is as the gradient the vector f is obtained, where 

with the /I's determined so that f is ortliotronal to 

each of the z^s. 

Thus the h^s are the so hit ions of 



h,{G-%,2,)+h2{G-%,z,)+ . . . =-(v,^0 

hi(G-^Zl,22)+h2(G-h2,22)+ . . . = — {7J,22) 



(14) 



in which the determinant of the h's is nonzero, by 
the positive definiteness of G~^ and tlie indepencknice 
of the 2's. 

The change in /jl \\\\vu we pass from a vector 
iCT^O to the vector x—arj, wiU now be computed 
where a is some real number and r/(;r) is given by (9). 
Assume that x is not characteristic, i. e. t/f^O. 
Direct cakadation leads to 



„{x)-Kx-»v) = l'^Jix,a), 



where 



with 



{x,Bx)' 



U,5,) ^ R{{x,Bv)] 



^^(^y '^ 



(XyBx) 



} r = 



{XyBx) 



X^0,rj9^0, 

(15) 

(16) 



yS = ix(rj) — fi{x). 



Our iteration procedure takes the following form. 
An initial vector ^o is givcni. Then the sequence 
{xi} is determined by 



Xi+i=Xi — ai rji 



Vi = v{'^'i), 



(17) 



where the real number at is to be specified at each 
step. In order to be sure that (17) determines a 
well-defined sequence we must verify that Xi9^0 for 
every i. To this end suppose that for a given j, 
Xj9^0; notice that from (9) 



(Xj,Grij) = {GXj, 77y)=0. 

Hence by (17) 

(Xj^i , Gxj+i) = {Xj, Gxj) + a] {r]j, Grjj) > 0, 



since G is positive definite. Hence Xj^i9^0. 
cco^O, the sequence is well (h^fined. 
From (15) we have 



(18) 
Since 



/x(xO-m(^z+i)-- / o,, X fiM, 



\xi,Bxi) 
this equation holding whenever t^j^O 



Ji{a) = f{Xi,a), 
(19) 



4. General Convergence Theorems 

In this section and the next will be established 
convergence theorems under a certain general as- 
sumption on the real sequence {at}. In the section 
following these two we shall (k^scribe two effcM'tivc 
w^a^^s of meeting these conchtions. For the ])res(Mit, 
we assume that the se(i licence has the* property that 
there exist real positive constants 62 and c such that 

07^ai<b2, and 0<c </i(aO whenev(U' r/, 9^0. (20) 

To simplify our discussion we wdsh to (hspose of 
the trivial case in which r]j=0 for some first index j. 
In this instance Xj is a characteristic vector bidonging 
to the characteristic value fjLj, and Xi = Xj, iy^j. The 
results to be given in this, and the next, section are 
now immediately verifiable. Hence we shall pro- 
ceed on the basis that 

r),9^0, i = 0, 1,2, . . .. 

In particular (19) holds for every i. 

Theorem 1. Suppose that the sequence [a J satisfies 
(20). The7i fjL ^=: jji (xi) is a decreasing sequence that 
converges to a characteristic value X oj (1). Also 
lim {Xi, Gxi)=d^O; in 2)articular the lengths \Xi\ are 

i— ».oc 

bounded and bounded away from zero. Every accum- 
ulation point oj [x^ is a characteristic vector in L{\). 
To make the proof, we notice first that by (19) 
and (20) that the sequence [ni] is decreasing, since 
B and G are positive definite. Since this sc^quence is 
bounded from below by the minimum charact Christie 
value Xi, it follows that it has a limit, call it ^i'. 
By (5), (19), (20) there is a positive constant e such 
that 

t,<eUx,)-ix{x,^,)\ ^. = i!"^!'i >0. 



'{Xi.GxiY 



Hence ^i— >0; in fact 



i=0 



(21) 



From (18) we derive 

i 

{XiJ,i, Gx^+i) = (xo, Gx^) n (1 + a^jtj). 
./■=o 

It is well known that the product on the right con- 

CO 

verges if Xj^/^j does. The latter condition holds by 


(20) and (21). This establishes the existence of the 
limit d and the asserted property of \xi\ . 

It now follows that 77 ^^0 and hence, by (9), 



Ax^ — yLiBxi-^{), 



(22) 



Let ?/' be any limit point of {.r J ; there exists at least 
one. Then ?/' is a non-nidl vector which, by (22), 
satisfies 



^2/'-//V=o. 



970822—52- 
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Thus /x' is a characteristic value and y^ belongs to 
L{ii'). This completes the proof. 

Theorem 2. Let (20) hold. Ij the sequence [xi] has 
an isolated accumulation point y, then {Xi} converges to 
y. Consequently ij the characteristic root X oj the 
preceding theorem is simple (i. e., dim L{\) = 1), then 
{Xi} converges to a characteristic vector. 

Let y be an isolated accumulation point and let 
P be the set of remaining accumulation points. Let 
Si and ^2 be open sets with disjoint closures with y 
in Si and P in S2. There is an i^ such that for i>i^ 
Xi lies in the union of the two open sets. Let (i'>0 
be the greatest lower bound of \u—v\ for u in Si, 
V in S2. Since r?^— >0 by the preceding proof, we 
may, by (20) and (17), choose an i">i^ such that 
\Xi^i—Xi\<^\d' for i>i^\ Hence if Xf is in Si, 
i>i^^, then Xf^i is in Si. It follows that for some j, 
x^ is in Si for all i>j. Thus P is null. This estal3- 
lishes the first conclusion of the theorem. 

Let X be a simple root. From Theorem 1 every 
accumulation point 1/ must satisfy (y, Gy)=d. There 
are exactly two vectors in L(X) which satisfy this 
condition. By the first part of the theorem {xj 
must converge to one of them. This completes the 
proof. 

Theorem 3. Let (20) hold, and let X be the character- 
istic value oj Theorem 1. Then there is a sequence oj 
vectors {yt} in L(\) such that 

lim {Xi—yi)=0. 

For the proof we utilize the decomposition 

'Xi=yi-\~2i, yt in L(X) and Zi J5-orthogonal to L(X), 

Now suppose Zi does not converge to zero. Then 
some subsequence! 2-} converges to Z9^Q. The cor- 
responding subsequence { x[ ] has a further subse- 
quence {x['} which converges to y in L{\), by 
Theorem 1. By the above decomposition, the cor- 
responding subsequence [y'i' ] converges, necessarily 
to a vector y^' in L(X). Hence, 

z=\\m {Xi-y"^ = y -y\ 

i—>co 

Thus z is both in L(X) and 5-orthogonal to this sub- 
space. Hence z=0. This contradiction completes 
the proof. 

It is worthy of notice that insofar as the iteration 
described in this paper is to be used as a practicable 
numerical method for finding some characteristic 
vector of (1), then the conclusion of the preceding 
theorem is as effective as the assertion that the se- 
quence {Xi} actually converges. For the theorem 
asserts that the sequence will come, and remain, 
within an arbitrarily small distance of some char- 
acteristic vector, this vector possibly varying with Xi. 

5. Convergence to the Least Characteristic 
Vector 

Because our iteration method is a gradient pro- 
cedure which decreases u(x), it is to be expected that 



under appropriate hypotheses on the problem (1) 
and the matrix G the sequence /z^ will converge to 
Xi, the minimum characteristic value. We shall 
show under a rather strong assumption that such 
convergence will take place, and further, that the 
sequence [Xi] will converge, whether or not \ is 
simple. 

In passing, we remark that although for definite- 
ness the iteration so as to produce a decreasing se- 
quence Hi has been formulated a slight modification 
in (20) produces an increasing sequence; the change is 

0<— Q:z<b2 Siudji(ai)<c<CO. 

The results of the previous section hold in this case, 
and under the forthcoming additional hypothesis of 
this section, convergence will take place to X^;, the 
the greatest characteristic value, and to a correspond- 
ing characteristic vector. 
Lemma 1. Suppose that 



AG-'B=BG-'A. 

Then problem (1) and the problem 
Gx=vBx 



(23) 



(24) 



have a comrnon complete set oj characteristic vectors yi, 
y2, . . . , yn "^ith (y^, Byq) = 8pq=Kronecker delta. 

To prove this B=H^ is written with H positive 
definite and Hermitian. Then (1) and (24), re- 
spectively, are equivalent to H~^AH~^z=\z and 
H~^GH~^z=^vz where z^=^Hx. It is easily verified 
that the condition (23) is equivalent to the commuta- 
tivity of the Hermitian matrices H~^AH~^ and 
H~^GH~^. It follows by standard theory that these 
matrices are simultaneously reducible to diagonal 
form b}^ a unitary transformation; hence they share 
a complete ortho-normal set of characteristic vectors 
Zi, Z2, . . . , Zn. The desired vectors yp are now 
given by Zp=Hy^. 

Theorem 4. Assume that (20) and (23) hold. For 
a given initial vector Xq, let m he the smallest integer 
j {j=l, 2, . . . , k)jor which Xq is not B -orthogonal to 
the characteristic manijold Lj. Then 

lim fjLi=\m, lim Xi=y7^-0 with y in Lm. 
We employ the basis of Lemma 1 to write 

^0=0^01 yi+(io2 y2+ . . . +(ion yn^ 

It is assumed that the basis has been ordered so that 
the first Vi vectors span Li, the next r2—ri vectors 
span 7^2, etc. (We take ro=0.) ^j multiplying 
each vector of the basis by i 1 we may assume that 



^op>0, p = l, 2, . 



Let 



ri<T2< . . . <Tn 

be the characteristic numbers of (1) corresponding 
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to the successive vectors of the basis. Thus ti = 
• . . =rr^=>^i, rr^+i= . . . =rr^ = \2, <'tc. Further- 
more let Vp be the characteristic uuhiIxm- oC (24) 
corresponding to yp. Tims 



and 



--(Vp, Gyp)yO 



G-'Byp- 



- ^v 'Vp, 



P = '^,2, 



n. 



Using (17), (9), (1) and the preceding equality 
it is found that 



with 
Also 



Xi = anyi+ai2y2+ • . . 
ai^.i,p=aip{ l + aiP-^ifjLi — Tp)}. (25) 

aip=(yp,Bxi). (26) 



Since Xq is by hypothesis /^-orthogonal to each of the 
subspaces Li, . . . , Lm-i, we have ao<?=0 for q=l, 
2, . . . /Vn-i. By (25), aiq=0 for every i. 
Hence every Xf is 7?-orthogonal to the same sul)spaces 
and by the extremum piinci])le of section 2 we have 

Now for each 2 = ^^-1 + 1, . . . , r^ consider the 
sequence {a^^}. For i=0, aoq>0, with the strict 
inequality holding for at least one value of q. From 
(25) it follows that the sequence is nonnegative and 
nondecreasing, since the term in braces is not less 
than 1 for the present range of q by (20), Vp^O, and 
the last disphiyed inequality. The sequence is also 
bounded, since \Xi\ is, by Theorem 1. Thus 



hm aiq=e^>0, q=r^_i + l, 
with at least one limit positive. 



(27) 



Let y^ be an arbitrary accumulation point of {Xi] 
(there is at least one) . Let { Xi } be a subsequence 
converging to y\ By Theorem 1, 2/' is a characteristic 
vector of (1). In addition it must belong to Lm, for 
otherwise 

a[, = (y,,Bx:)My<i,ByO = 0, 

contrary to (27). It now follows that a'^-^O for p 
outside the range of q, as in (27). Thus 

Since y'^ was an arbitrary limit point we have the 
desired convergence of x ^ to a vector y in L^. Finally, 



lim „ . _ lim (^iiAll) _ (V, ^V) ^ 
i-^-(Xi,Bxi) {y,By)' 



>z = 



Corollary. The condition (23) is satisfied ij (1) 
G=B, or (2) G=I and AB=BA. 

This is easilv verified. The case G=B=I with A 



real symmetric was studied in greater detail in the 
paper by the pi-esent authors rc^ferred to earlier. 

6. Construction of the Sequence (aj 

We shall describe two methods of ('onstructhig 
this sequence so that condition (20) is satisfi(Ml. 
Lemma 2. Let the real number 62 satisfy 



0<62< 



2 m(G) 



(28) 



where Xjc—^i is the spread of the characteristic values of 
(1) and the other quantities are defined by (5). Then 
there is a constant Ci>0 such that for every xt^O with 
7] 7^0 we have 

f(x,a)>Cia for —b2<a<b2. 

Let X and a satisfy the required conditions. 
Assume further that a 9^0. Then we may write (16) 
m the form 



/(^.__ 



(2—psa) 



Now 



a {x — arjyB(x — a7j))/{x,Bx) 



(29) 



where we luive used (5) and the (^xlremuni property 
of Xi and \k- Thus the numerator on the right side 
of (29) exceeds the positive number (2 — 63). Our 
proof will be complete if we can show that the corre- 
sponding (positive) denominator is bounded uni- 
formly in X and a. By (5) it is sufhcient to show 
1 7? I /I X I is bounded. But this is an immediate conse- 
quence of (9). 

As a consecpience of Lemma 2 we have the follow- 
ing result. 

Theorem 5. Let the sequence [at] of real numbers 
be such that 

0<b,<a,<b2 (30) 

for constants 61, 62 "^ith the latter as in (28). Then 
this sequence satisfies condition (20). 

Our second method of prescribing a=a{x) stems 
from the idea of maximizing /(a) =/(a;, a) as a func- 
tion of a, hence choosing « as a zero of /^(a). A 
simple calculation leads to 



V. ^ (pqs — r)a^-psa+l 



X9^0,7]9^0. (31) 



A function a(x) is now defined as follows. Choose 
an arbitrary fixed positive constant 64. Let 



a(x)-- 



first zero of f\a) on < a < 64, 



(x^0,r/?^0). 



64, if no such zero exists. 
This function is computationally simple; its con- 
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struction involves only the solution of a quadratic 
equation and a comparison of numbers. 

Theorem 6. For a given constant 64>0 and a given 
initial vector x^t^^ determine the sequence {at} by 
means of the iteration formula (17) and the equation 



a{Xi) if 7)i9^^j 
.64 if ^i = 0, 



where a{x) is given by (32). Then this sequence 
satisfies condition (20). 

For the proof several properties of the function 
(32) are established. The coefficients of the quad- 
ratic expression in the numerator in (31) are uni- 
formly bounded in x. This is a consequence of (9) 
and (5). It follows that its zeros are uniformly 
bounded away from 0. Hence, there is a number 
bs^O such that 

bs<a(x)<b,. 

Thus the first condition of (20) is fulfilled. Now 
f{a) is a non-decreasing function on 0<a<a(x); for, 
by/'(0) = 2 and (32) its derivative is non-negative 
on this interval. Select b2<Cb^ so that (28) holds. 
Then 

j{oc{x))>j{b,)>C,b, 

uniformly in x. From this we see that the second 
condition of (20) also holds. The proof is com- 
plete. 

7. Obtaining Further Characteristic Vectors. 

Suppose that a characteristic vector y' with 
characteristic value X' is known. We propose to 
show how the preceding iteration scheme may be 
modified so as to secure a new, independent char- 
acteristic vector. The procedure will be to start 
with an initial vector Xo which is ^-orthogonal to y^ 
and maintain this orthogonality at each step of the 
iteration. 

Thus, let 

2=By' (33) 

and suppose we have a vector Xt^^ such that 
{x, 0) = O. We wish ournext approximation x — a'^ to 
be orthogonal to z, i. e., we require ' 



(f, ^) = 0. 



(34) 



In order to select the direction f in an optimal man- 
ner, according to section 3, it is chosen proportional 
to the solution of (8) with the side condition (10), z 
as in (33). We therebv determine f by (11) and 
(12). Notice that by (11), 



(f, G^f)=(r, ?), {x, (?f)-o 



(35) 



using (34) and (7). Now suppose ^=^^{x)9^0. 
Then a straightforward calculation using the first 
equation of (35) shows that equations (15) and (16) 
are valid with 77 everywhere replaced by f . 



The iteration formula (17) is now replaced by 

XiJ,i-=^Xi — aiU (36) 

with the corresponding formula (18) valid by the 
second equation of (35). Our present sequence 
however has the additional property 



{Xi,By') = 0. 



(37) 



An examination of section 4 shows that with one 
necessary verification, to be remarked on soon, the 
three theorems of that section remain valid. We may 
now add, however, that the characteristic accumula- 
tion vectors y in Theorem 1 and 2 are i?-orthogonal to 
y', and that the vectors yi of Theorem 3 have the 
same property. The required verification is to estab- 
lish the equivalence of f(x)==0 with 'q{x)^0 and 
f ^-^0 with r^f-^O. Here, of course, ?; is given by (11). 
We shall prove only the second equivalence ; this will 
suggest the proof of the first. That fz— >0 when 
77 ^^0 is immediate from (9), (11) and (12). Suppose 
fi->0. We note first that (^,,t/0--0; this follows 
from (7), (37) and the fact that y' is characteristic. 
Hence by (11) 

{GU.y')=h,{By^,y^), 

where hi of (12) has the obvious meaning. Hence 
ht-^Q. It follows by (11) that ^^-^0 and hence 
7)1-^0, as desired. 

The constructions of section 6 remain valid under 
the present iteration (36). It is only necessary to 
verify the uniform boundedness of |f|/|x|, X9^{), 
f^^O. This is an immediate consequence of (11), 

(12) and (9)._ 

To maintain the validity of Theorem 4 of section 
5 in the present context it is not necessary to modify 
the iteration procedure from (17) to (36). The 
earlier method is adequate. For, by (25), we see 
that if the initial vector Xq is i?-orthogonal to y', so 
is each vector Xi, and hence the limit vector y. 

If several characteristic vectors y^ , Y' , . . . with 
characteristic values X^ X'^ . . . are known then 
the iteration (36) is to be used with f determined by 

(13) and (14), where ^,=%^ Z2=By'\ .... The 
resulting sequence {Xi} will be J5-orthogonal to the 
known characteristic vectors and hence all limit 
vectors will have this property. It may be easily 
verified that the preceding remarks concerning the 
validity of the results of the previous sections remain 
in force. 

The iteration x—a^ determined by (13) and (14) 
is theoretically equivalent to the following procedure. 
First form the vector x' = x—ar}^x—aG— 1|, and then 
determine ki, hi, . . . so that 



x'' =x' ^k,G-''z,+h2G-'z2+ 



(38) 



is i?-orthogonal to y^, y^\ . . . . It is easy to verify 
that x^^=x—a^. However the procedure just de- 
scribed has the computational advantage that the 
vector Xi formed at each stage is accurately B- 
orthogonal to the known characteristic vectors. In 
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the alternative procedure (.'^(>), /^-orthogonality of Xt 
may be gradually lost throu<ih j'oiuul-ofr errors 
(although this may be reniedicMl by the ndditional 
work of occasional /^-orthogoiudizatioii). 

It would also be extrenn^y convenient for the 
determination of the i-'s in (38) (or the A's in (13)) to 
have (6~'^2j,Zm)=0 for JF^m. This may be achieved 
by successiveh^ orthogonalizing as additional charac- 
teristic vectors are accumulated. Suppose y^^ has 
been calculated with ?/' known. Then with Zi = By\ 
define 22 bv 

Z2=By'' + lz, 

with / chosen so that (z2,G-'zi)-=0. Then the G-^- 
orthogonal set (^1,^2) may be used in (88) (or (13)). 
Suppose now that a third independent characteristic 
vector ?/'" is calculated. Put 

Zs-By'^'+hz^+kzo 

with li, k, chosen so that (z\G-^Zi) = {z;,G-'z2)=0; 
this determination is simplified by the ^"^-orthog- 
onality of (^1,^2). The new G^~ ^-orthogonal set 
{^u^2, ^3) may now be used in (38). The extension 
to more vectors is clear. 

8. Problems Equivalent to (1) 

We leave now the calculation of characteristic 
vectors and valuers and raise the question of when a 
general problem of the type 



Cx=^\Dx, with \C-\D\^^ in X, 



(39) 



is equivalent to one of type (1), that is, one with A 
Hermitian and B positive definite. For the moment 
we impose no additional conditions on the complex 
matrices C and D. Clearly (39) has at most n char- 
acteristic roots (including the possible real value 
X=3: 00)^ where n is the order of the matrices. 
Consider a second problem 



Rx=\Sx. 



(40) 



By asserting that problems (39) and (40) are equiva- 
lent we mean that there is a nonsingular matrix K 
and one-to-one correspondence between the distinct 
characteristic values of (39) and those of (40) with 
the following property: if X' of (39) corresponds to 
X'' of (40), then y^ is a characteristic vector of (39) 
belonging to X' if and only if y'^=Ky' is a character- 
istic vector of (40) belonging to X''. If 6 is a number 
such that \C—bD\9^0, then (39) is equivalent to 
(C—bD)x= {\—b)Dx, and hence to 

Ex=, Fx, with E=D,F=C-bn,v=^-^j^, |F|f^O. 

X — 

(41) 

Notice that v= 00 is not a characteristic value of (41). 
Lemma 2. Let \j, j=l, 2, . . . , k, be the dis- 
tinct characteristic values of (39) and let H be the space 
spanned by the spaces Lj=L{\j). 



Then 



X; dim L{\j) - 



^dmi J^. 



It is sufficient to make the proof for the equiva- 
lent problem (41), where Vj=l/{\j—b) and Lj^L{vj). 
It may be assumed that the characteristic values so 
ordered that if v=^ is a characteristic value, then 
1^1 = 0. Let Hj be the space spanned by Li , L2, . . . , Lj. 
Any two spaces Lj have only the null vector in 
common. It follows that the statement 



y^, dim Li = A\vix Hj 



1=1 



(42) 



is valid for j=2. Assume that (42) holds for j=m<Ck; 
we shall show that it is then valid for j=m-\-l. If 
(42) were false for j=m-\-l, then there would exist 
a vector y„i^i9^0 of L,„+i such that 

ym-^i=yi+y2+ . . . +ym, yi^I^h 
1=1,2, . . . ,m. 

with not all the terms on the right null. From 
Eyi = viFyi is obtained 

Bui Eym+i=Vrn+iFym^-i- Since \F\=^0, 



hl2/m + l=J'l2/l+ • • • +^mym, 



so that 



\Vm + l / X^rn + i J 



0, 



since Vrn+i9^0. But (42) holds for j = m. Hence 
each term on the left above must vanish. At least 
one yi, say yl, is not null. Then Vrn^.-i'=^ij ^'<^. 
This contradiction completes the proof. 

From the lemma it is clear that the condition 



y^. dim Lj = n 



(43) 



for (39) is equivalent to the assertion that every 
vector may be written uniquely, apart from order of 
terms, as a sum of characteristic vectors belonging 
to distinct characteristic values. 
Our main theorem is the following. 
Theorem 8. For a problem (39) each oj the following 
conditions implies the other. 

I. Problem (39) is equivalent to one oj 

type (1). 
II. The characteristic values oj (39) are 

real and (43) holds. 
HI. There exists a positive definite matrix P 
such that CPD* is Hermitian. 
That 11 follows from I is a consequence of the well- 
known fact that for problem (1) condition II is 
valid. To show that II implies III we construct the 
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noiisingular matrix Y whose columns are the n 
linearly independent (by Lemma 2) characteristic 
vectors of (41). Thus EY=FYA, where A is a real 
diagonal matrix comprising the characteristic roots 
of (41). Hence A=Y-^F-^EY is Hermitian, that is 



Hence 



EPF'^^FPE'^, P=YY''. (44) 

From the definition of E and i^ in (41) is obtained 

DPC* = CPD* (45) 

as desired. 

Finally, to show that HI implies I we observe first 
that (45) implies (44). Problem (41) is equivalent to 

EPF''z=vFPF''z, x=PF*2, 

which is of type (1). This completes the proof. 

Corollary. The fallowing condition may he added to 
those of Theorem 8. 

IV. There exists a positive definite matrix P 
such that C*PD is Hermitian. 
For the proof we need only observe that II holds 
for (39) if and only if it holds for 

C*x=Wx, 

and then apply HI to this problem. 

If our theory is limited to real vectors over the 



field of real numbers, then the following specializa- 
tions occur. The matrices A and B are to be taken 
real symmetric and the matrices C and D are to be 
taken real. The matrix P of HI and IV is real 
S3^mmetric and the condition of reality in II is 
superfluous. 

If the matrix P of III or IV is known then the 
transformation of (39) to (1) involves only matrix 
multiplications and hence is computationally feasible. 
For example, in case of III with \D\ f^O we write 

CPD''z = \DPD''2, x=PD''2. 

From the solutions z of this problem we obtain the 
solutions x of (39) by a direct matrix transformation. 
If |C|f^O, we write ^ 



DPC'z^pCPC^z, 



\= 



1 



x^PC'z. 



In the case of IV with, sa^^ |Z>| f^O, we write 

D*PCx=\D''PDx 

which is of type (1) with solutions exactlv those of 
(39) . In the case of either III or IV with \C\ = \D\ = 0, 
we first transform to (41) and then apply the preced- 
ing technique indicated for |2)| f^O. 

Los Angeles, September 11, 1950. 
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